Cross- vs Within-Company Defect Prediction Studies
نویسندگان
چکیده
In a recent May 2007 IEEE TSE article, Kitchenham et.al. explored effort estimation and found contradictory evidence about the value of crossvs within-company data. Those contradictory results may have been the result of effort estimation features, some of which are subjective in nature. Static code features are different than effort estimation features. They can be generated in an automatic, rapid, and uniform manner across multiple projects. Therefore, in theory, the conclusions reached from such features may be more uniform. This paper tests that theory by searching for uniform conclusions using crossor within-company static code features. Whereas Kitchenham et.al. explored effort estimation, this paper explores defect prediction. Cross-company static code features will be found to generate higher false alarm rates than within-company data. Hence, cross-company data is best used for mission critical software where (a) the extra costs associated with high false alarm rates is compensated by (b) an associated increase in the probability of predicting fault modules. For other classes of software, false alarm rates can be decreased using a very small amount of local data (often, just 100 modules). In our experiments, the use of within-company data halved the false alarm rate while decreasing prediction rates by only ≈ 10%. Hence, for non-mission-critical software, we strongly recommend using within-company data for defect prediction.
منابع مشابه
Using Class Imbalance Learning for Cross-Company Defect Prediction
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, the performance of such CCDP models is susceptible to the high imbalanced nature between the defect-prone and non-defect classes of CC data. Class imbalance learning is applied to alleviat...
متن کاملTransfer learning for cross-company software defect prediction
0950-5849/$ see front matter 2011 Elsevier B.V. A doi:10.1016/j.infsof.2011.09.007 ⇑ Corresponding author. Tel.: +86 028 61830557; fa E-mail addresses: [email protected] (Y. Ma), g [email protected] (X. Zeng), [email protected] Context: Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company da...
متن کاملA Data Filtering Method Based on Agglomerative Clustering
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a crosscompany defect prediction model with high performance. To address such issues, this paper proposes a dat...
متن کاملA Multi-Source TrAdaBoost Approach for Cross-Company Defect Prediction
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a prediction model with high performance. On the other hand, brute force leveraging of CC data poorly related t...
متن کاملNegative samples reduction in cross-company software defects prediction
Context: Software defect prediction has been widely studied based on various machine-learning algorithms. Previous studies usually focus on within-company defects prediction (WCDP), but lack of training data in the early stages of software testing limits the efficiency of WCDP in practice. Thus, recent research has largely examined the cross-company defects prediction (CCDP) as an alternative s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007